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. Abstract. Many practical studies rely on hypothesis testing procedures applied to data 

Q I sets with missing information. An important part of the analysis is to determine the im- 

pLn ' pact of the missing data on the performance of the test, and this can be done by properly 

quantifying the relative (to complete data) amount of available information. The problem 
is directly motivated by applications to studies, such as linkage analyses and haplotype- 
based association projects, designed to identify genetic contributions to complex diseases. 
["T^l [ In the genetic studies the relative information measures are needed for the experimental 

design, technology comparison, interpretation of the data, and for understanding the 
behavior of some of the inference tools. The central difficulties in constructing such in- 
I formation measures arise from the multiple, and sometimes conflicting, aims in practice. 

^ ■ For large samples, we show that a satisfactory, likelihood-based general solution exists by 

using appropriate forms of the relative Kullback-Leibler information, and that the pro- 
posed measures are computationally inexpensive given the maximized likelihoods with 
J> . the observed data. Two measures are introduced, under the null and alternative hypoth- 

I esis respectively. We exemplify the measures on data coming from mapping studies on 

J — ■ the inflammatory bowel disease and diabetes. For small-sample problems, which appear 

OA . rather frequently in practice and sometimes in disguised forms (e.g., measuring individ- 

fsj i ual contributions to a large study), the robust Bayesian approach holds great promise, 

' though the choice of a general-purpose "default prior" is a very challenging problem. We 

also report several intriguing connections encountered in our investigation, such as the 
connection with the fundamental identity for the EM algorithm, the connection with the 
second CR (Chapman-Robbins) lower information bound, the connection with entropy, 
^ I and connections between likelihood ratios and Bayes factors. We hope that these seem- 

^ ■ ingly unrelated connections, as well as our specific proposals, will stimulate a general 

- - - discussion and research in this theoretically fascinating and practically needed area. 
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1. MANY CHALLENGES AND AN OVERVIEW 
1.1 General Challenges 

The central aim of this paper is to estabhsh, in the 
context of hypothesis testing with incomplete data, 
a general framework for quantifying the amount of 
information in the observed data for a specific test 
being performed, relative to the full amount of infor- 
mation we would have had the data been complete. 
We do not address the issue of what is the best test- 
ing procedure, with or without the complete data, 
nor the issue of whether a full modeling/estimation 
strategy should or can be used instead. Rather, we 
address an increasingly common practical problem 
where the investigator has chosen the testing proce- 
dure, but needs to know the impact of the missing 
data on the test in terms of the relative loss of in- 
formation. Such is the case in the genetic studies we 
briefly review in Sections 2 and 3. 

Besides the specific challenges listed in Section 1.2, 
there are a number of general theoretical and method- 
ological difficulties for establishing this general frame- 
work. First, unlike the similar task for estimation, 
where the notion of "fraction of missing informa- 
tion" is well studied and documented (e.g., Demp- 
ster, Laird and Rubin (1977); Meng and Rubin (1991)) 
for hypothesis testing, there are two sets of measures 
to be contemplated, depending on whether the null 
hypothesis or the posited alternative model can be 
regarded as approximately adequate. Indeed, this is 
the very question the hypothesis test aims to provide 
partial evidence to discriminate. 

Second, hypothesis testing procedures, especially 
those of nonparametric or semiparametric nature, 
are often constructed without reference to a spe- 
cific (parametric) model. However, without an ex- 
plicit model to link the unobserved quantities with 
the observed data, the very task of measuring how 
much information we have missed is neither possible 
in general nor meaningful. It is known, though not 
widely (e.g., Chernoff (1979); Meng (2001)), that 
certain robust statistical procedures for estimation 
or testing can produce more efficient or powerful re- 
sults with less data. Consequently, without assum- 
ing that our testing procedure is optimal under a 
specified optimality criterion, we may end up with 
the seemingly paradoxical situation that additional 
data may make our procedure less efficient or pow- 
erful. That is, we may declare that more information 
is available with less data. 



Third, in the context of small samples, quanti- 
fying information requires going beyond convenient 
and standard measures such as Fisher information, 
which is essentially a large-sample measure. Small- 
sample problems are rather frequent with incom- 
plete data, as missing data reduce effective sample 
sizes. For the genetic studies we investigate in this 
paper, the small-sample problems arise even when 
there appear to be ample amounts of data. For ex- 
ample, we are often interested in measuring infor- 
mation content in individual components (e.g., an 
individual family in a large linkage study). In hap- 
lotype association studies, we often test haplotypes 
individually — data size may be large enough for test- 
ing a common haplotype, but very small for a rare 
one. In addition, an individual person can be fully 
informative for one haplotype because we know s/he 
cannot carry it, but much less so for another when 
we are uncertain whether s/he carries it or not. All 
these problems remind us that, in general scientific 
studies, small-sample problems appear more often 
than meets the eyes, namely, the numerical value of 
the sample size, because they sometimes appear in 
disguised forms. 

Given the complex nature of small-sample prob- 
lems requiring information measures, we literally have 
'spent several years in our quest of finding a gen- 
eral workable approach. Not surprisingly, our con- 
clusion is that robust Bayesian methods hold more 
promise. As we propose in Section 5, after estab- 
lishing a likelihood-based large-sample framework in 
Section 4, this problem can be dealt with by consid- 
ering posterior measures of the flatness of the entire 
likelihood surfaces. However, the problem of specify- 
ing an appropriate "default" prior is challenging. We 
report both our promising findings and open prob- 
lems, hoping to stimulate further development on 
this practically important and theoretically fascinat- 
ing topic. We also discuss various interesting theo- 
retical connections (Section 6), as well as further 
methodological work and applications (Section 7). 

1.2 Conflicting Aims in Genetic Studies 

The central applied problem that motivated our 
work was the task to sensibly measure and efficiently 
compute the amount of information available in a 
particular genetic data set for a particular hypoth- 
esis tested by a particular statistical procedure. All 
genome-wide linkage screens carried out on qualita- 
tive and quantitative traits as well as most of the as- 
sociation studies extract only part of the underlying 
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information. Missing information can be the result 
of different sources, such as absence of DNA sam- 
ples, missing genotypes, spacing between markers, 
noninformativeness of the markers, or unknown hap- 
lotype phase. Investigators want to know how much 
information is available in the observed data for the 
purpose of the study relative to the amount of infor- 
mation that would have been available if the data 
were complete. The notion of complete data is prob- 
lem specific and, in parametric inference, depends 
on the sufficient statistics; for example, in linkage 
studies where the IBD (identical by descent) pro- 
cess is sufficient for inference, complete data can be 
achieved even if genotypes and/or individual sam- 
ples are missing. Measures of relative information 
are needed for designing follow-up strategies in link- 
age studies, for example, using more genetic markers 
with existing DNA samples versus collecting DNA 
samples from additional families. Even for situations 
where we do not intend to recover the missing data, 
including situations where they cannot possibly be 
recovered (e.g., DNA samples from deceased ances- 
tors), such measures can still be useful for the in- 
terpretation of the data and of the results, and for 
understanding the behavior of some of the inference 
tools (e.g., see Section 4.5). 

The key methodological challenge is to find a mea- 
sure that (1) is a reliable index of the relative infor- 
mation specific to a study purpose, (2) conditions 
on particular data sets, (3) is robust in the sense 
of general applicability, including to small data sets, 
(4) is easy to compute and (5) is subject to mean- 
ingful combination axioms. The reliability criterion 
(1) is obvious, and the criterion (2) is necessary be- 
cause typically an investigator is interested in mea- 
suring the relative information in the data set at 
hand, not with respect to some "average" data set. 
Criterion (3) is desirable because in a typical genetic 
linkage study one needs to deal with a large amount 
of data with a variety of different complex structures 
(e.g., from a nuclear family to a very complex pedi- 
gree), often under time constraints, and thus it is 
not feasible to design separate measures to suit par- 
ticular data structures. Criterion (4) is needed for 
similar reasons — any method without suitable com- 
putational efficiency, regardless of its theoretical su- 
periority, will typically be ignored in routine genetic 
studies given the practical constraints. Criterion (5) 
ensures certain desirable coherence to prevent para- 
doxical measure properties (e.g., more informative 



studies receive less weight in the combined index) 
when combining studies. 

To deal with all these criteria simultaneously re- 
quires a careful combination of Bayesian and fre- 
quentist perspectives. Some of the criteria [e.g., (1) 
and (2)] are most easily handled from the Bayesian 
perspective, and some [e.g., (5)] are easier to satisfy 
with a frequentist criterion. With large samples, as it 
is typical, likelihood theory provides a rather satis- 
factory solution, as we demonstrate in Section 4. For 
small samples, we have not been able to find a bet- 
ter alternative than to follow a robust Bayesian per- 
spective, which takes full advantage of the Bayesian 
formulation in deriving information measures with 
desirable coherent properties, and at the same time 
it seeks measures that are robust to various misspec- 
ifications and are thus more generally applicable. We 
emphasize, however, that the computational burden 
associated to these Bayesian measures should not be 
overlooked, even in this age of the MCMC revolu- 
tion, for the reasons underlying criterion (4) above. 
Nevertheless, it is more principled and fruitful to 
seek ways to increase computational efficiency after 
we establish theoretically sound measures. This is 
the route we follow. 

1.3 Imputing Under the Null or 
Not — Gaining Insight 

For those who have no (direct) interest in genetic 
studies, the following simple example may provide a 
stimulus to follow the methods developed in our pa- 
per. The example also provides some insights into 
a somewhat "perplexing" practical question when 
dealing with hypothesis testing in the presence of 
missing data: shall we impute under the null or not? 
We emphasize that the purpose of this example is 
not to illustrate imputation methods. Indeed, nei- 
ther method discussed below can be recommended 
in general. Rather, it shows how we can quantify 
relative information by measuring how inaccurate is 
to erroneously treat imputations as if they were ob- 
served data. 

Specifically, suppose yi,...,yn are i.i.d. realiza- 
tions of Bernoulli(p), but only no < n of them are 
actually observed. Assuming that the missing data 
are missing completely at random (Rubin (1976)), 
we can denote the observed data by yi , . . . , y„o • Ev- 
idently, a simple large-sample test (assuming no is 
adequately large) for Hq ■.p = po is to refer the test 
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statistic (where the subscript "ob" stands for "ob- 
served data") 

/.N rr Voh-Po 

VPoll -Poj/f^o 

to the null distribution A^(0, 1), where yob is the av- 
erage of the observed data. 

Let us assume that the missing y's were imputed 
using two mean-imputation methods. The first 
method is to impute each missing y by its mean, es- 
timated by yob- The second procedure is to impute 
each missing y by its mean assuming Hq is true, 
that is, by po- Clearly, with either imputation, if we 
treat the imputed data as if they were observed and 
apply the test (1) with nQ = n, we will not reach the 
valid conclusion unless we adjust the null distribu- 
tion N{0,1). 

For the first method, the average of all data, ob- 
served and imputed, is = yob- Therefore, if we 
erroneously treat the imputed values as real obser- 
vations, we would compute our test statistics as 



(2) 



Vi -Po 



y^Po{l-po)/n 



fob. 



where r = no/n. In contrast, the second method would 
lead to 

(3) n= , =V^rob, 

^JpQ{\-pQ)|n 

because the average of all data, observed and im- 
puted, is yl = ryoh + (1 - r)po. 

Two aspects of the above calculations are impor- 
tant. First, in both cases, the resulting "completed- 
data" test statistic is proportional to the benchmark 
given in (1). Consequently, imputing under the null 
or not leads to the same answer, as long as we ad- 
just the corresponding null distribution accordingly 
(the generality of this equivalence result obviously 
needs qualification, but the validity of a test is au- 
tomatic when its null reference distribution is cor- 
rectly specified). Second, identities (2) and (3) yield 
respectively 

(4) -=(^l and 



the second test is, when the imputations under the 
null are treated as real observations. Our general 
large-sample results given in Section 4 show that 
these ideas are in fact general, once we replace the 
statistics in (4) by their appropriate log-likelihood 
ratio counterparts (recall the large-sample equiva- 
lence between log-likelihood ratio statistics and the 
Wald statistics in a form similar to T^). Readers 
who are not interested in genetic applications can 
go directly to Section 4, as Sections 2 and 3 provide 
the necessary background on the genetic problems 
to which our methods will be applied. 

2. GENETIC LINKAGE ANALYSIS 
2.1 Allele-Sharing Methods 

Linkage refers to the co-inheritance of two mark- 
ers or genes because they are located closely on 
the same chromosome. Allele-sharing methods are 
part of linkage techniques for locating regions on the 
genome that are very likely to contain disease sus- 
ceptibility genes (e.g., Ott (1991)). The data usu- 
ally consist of genotypes from a large number of 
markers (polymorphic locations) spread along the 
genome for individuals from n pedigrees. The allele- 
sharing methods focus on affected individuals, but 
genetic data on unaffected relatives are used to infer 
the inheritance patterns. Alleles at the same locus 
in two individuals are said to be identical by descent 
(IBD) if they originate from the same chromosome, 
and are called identical by state (IBS) if they appear 
to be the same. For a given location on the genome, 
the evidence for a disease-susceptibility locus linked 
to it is given by the sharing of alleles IBD among 
affected relatives in excess of what is expected if the 
marker is not linked to a genetic risk factor. 

The simplest example of a data structure is the af- 
fected sib pair, as shown in Figure 1, where the left 
diagram shows a family with two affected brothers 
in which the parental information at a fixed locus is 



-^0 
Toh 



The results in (4) are important because r = no/n 
measures the relative sample sizes, and hence the 
"relative information" in an i.i.d. setting. These re- 
sults suggest that we consider measuring the rela- 
tive information by how liberal the first imputation- 
based test is, when the imputations under the alter- 
native are treated as real data, or how conservative 
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Fig. 1. Pedigree diagrams of an affected sib pair; the IBD 
sharing is known for the sibs in the left diagram, but only the 
IBS sharing is known for the sibs in the right diagram. 
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denoted by "Al" and "A2" for the father, and "A3" 
and "A4" for the mother. The sibhngs have one al- 
lele IBD (A2) which they inherited from their father, 
and different alleles inherited from their mother. In 
general, siblings share either two, one or no alle- 
les IBD. Unconditionally, each allele has probability 
1 /2 to be transmitted; this leads to a probability of 
1/4, 1/2, 1/4 for sharing zero, one, two alleles, re- 
spectively, identical by descent. Conditioned on the 
affection status of the sibs, in the neighborhood of 
a disease gene, there is an expected increase in the 
number of alleles IBD across a collection of sib pairs; 
statistical testing methods are often used to measure 
the strength of the evidence. 

In general, the data are not as simple as in the 
above example. The pedigree structures can con- 
tain far more complicated relations than sib pairs 
and more than two affected individuals. Most of the 
data sets extract only part of the underlying IBD 
information. In general, the information is incom- 
plete at locations between markers. Even at marker 
locations, a variety of factors can lead to missing in- 
formation, including any genotype data on deceased 
or unavailable family members, missing genotypes in 
the typed individuals, or noninformativeness of the 
markers. The right diagram of Figure 1 illustrates a 
family where the parental allele information is miss- 
ing, so even though the allele sharing among the 
sib pair appears to be identical in pattern with that 
of the left diagram, it is not known if the sibs share 
one or zero alleles IBD as the two "A2" alleles might 
originate on different parental chromosomes. 

In general, the marker information of all the loci 
on the chromosome is used to calculate a probabil- 
ity distribution on the space of inheritance vectors. 
For locus t and pedigree i, an inheritance vector, 
uji = u}i{t), is a binary vector that specifies, for all 
the nonfounding members of the pedigree, which 
grand-parental alleles are inherited. Under the as- 
sumption of no linkage, all inheritance vectors are 
equally likely, which leads to a uniform prior distri- 
bution on their space. For a sib pair, the inheritance 
vector has four elements, one for each parent-child 
combination. For example, the first element speci- 
fies whether the allele inherited by the first sib from 
his father originates from the grandfather or grand- 
mother. Assuming no interference (Ott (1991)), a 
Hidden Markov Model can be used to calculate the 
inheritance distribution conditional on the genotypes 
at all marker loci (Lander and Green (1987)). The 
distribution of the inheritance vectors conditional 



on the observed data is the basis of the statistical 
inference, and it is used to determine the conditional 
distribution of the number of alleles IBD at a given 
location. 

2.2 Hypothesis Testing Using Imputed 
Sharing Scores 

In order to summarize the evidence for linkage 
in a pedigree, we can use a score Si (Whittemore 
and Halpern (1994); Kruglyak et al. (1996)), a mea- 
sure of IBD sharing among the affected individu- 
als at locus t. In general. Si is chosen such that 
it has a higher expected value under linkage than 
under no linkage. The standardized form of Si is 
Zi = {Si - Hi)/cri, where m = E{Si\Ho) and af = 
Var(S'j|ffo)- The test is typically in the form of lin- 
ear combination over the n pedigrees. 



where 7j > are weights assigned to the individual 
families. The weights can be chosen according to 
the number of affecteds and the relationship among 
them and/or other covariate information. Under the 
null hypothesis, Z has mean and variance 1. Devi- 
ations from the null hypothesis can be tested using 
a iV(0, 1) approximation or the exact distribution of 
Z. 

In general, Zj's are not directly observable due to 
missing information. A common practice is to im- 
pute/replace Zi by Wi = E(Zj|data, Hq) to construct 
a test statistic (Kruglyak et al. (1996)), 

(6) W = ^ ;=^^' ^ = E{Z\data,Ho). 

The main problem with this test statistic is the dif- 
ficulty of directly evaluating its statistical signifi- 
cance. A standard A^(0, 1) approximation can be 
very inaccurate when there is a large amount of 
missing information, as can be seen from the fol- 
lowing variance decomposition: 

YaiiZ\Ho) =Yav{E{Z\data,Ho)\Ho) 

(7) 

+ E(Var(Z|data,iJo)|i^o), 

which implies 

(8) Yav{W\Ho) = 1 - E(Var(Z|data, ifo)|i^o) < 1- 

In many cases Var{W\HQ) can be substantially less 
than 1, leading to a conservative test when the A^(0, 1) 
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approximation is used. A more accurate approach is 
described in Section 2.3. 

It is important to emphasize that, in allele-sharing 
studies, the amount of missing information can be 
made arbitrarily low, at least in theory, by increas- 
ing the number of markers in the region. That is 
why, in regions with evidence for linkage, it is im- 
portant to predict whether by genotyping additional 
markers one will obtain a more significant devia- 
tion from the null. A different strategy for increasing 
the amount of information is to increase the sample 
size, that is, to collect DNA samples from additional 
families. Therefore knowing how much information 
is missing from the data is important for designing 
efficient follow-up strategies (see also Nicolae and 
Kong (2004)). 

2.3 Associating a Test With a Model 

The linkage methods we described are based on a 
chosen test statistic. In order to measure the rela- 
tive information for a test statistic, we need to asso- 
ciate it with a model which specifies the stochastic 
relationship between the observed data and miss- 
ing data beyond the null. Otherwise the question 
of relative information is not well defined, as it is 
emphasized in Section 1.1. It has been shown by 
Kong and Cox (1997) that for every test statistic 
of the form of (5), a class of one-parameter models 
can be constructed such that the efficient score (Cox 
and Hinkley (1974)) from each of the models gives 
asymptotically equivalent results to the given statis- 
tic. The inference procedures based on these models 
can be applied to any pedigree structure and missing 
data patterns. 

As an illustration, we briefly describe the expo- 
nential tilting model of Kong and Cox (1997) ap- 
plied to the one-locus allele-sharing statistic. A key 
assumption underlying this model (and other mod- 
els for associating tests) is that the distribution of 
the inheritance vectors satisfies 

P{uji\Ho) P{Zi = z{u}i)\Ho) 

where is an inheritance vector for pedigree i that 
leads to a standardized scoring function equal to 
z{uJi), and Ha denotes the alternative hypothesis. 
Note that any time an investigator employs a test 
solely based on the Z's, as far as measuring infor- 
mation concerns, s/he is effectively assuming (9) re- 
gardless of whether or not s/he is aware of it. 



Under assumption (9), it is sufficient to define the 
alternative models for Zj's. The exponential tilting 
model has the form 

(10) PeiZ, = z) = PoiZ, = z)c,{e) eMHz), 

where Po{Zi = z) is specified by the null (i.e., no 
linkage) and ^(6*) = E^-Po(^i = ^;) exp(6l7jz)]"^ is 
the renormalization constant. When Z is binary (e.g., 
as with half-sibs), the model is the same as the lo- 
gistic regression model 



(11) 



logitPe(Zi = l)=^i + e7, 



where Hi = logitPo(-^i = 1)- 

Given the exponential tilting model or other sim- 
ilar models (e.g., the linear model of Kong and Cox 
(1997)), the log-likelihood can be calculated exactly 
for any missing data patterns under the assumption 
(9). Similar constructions can be done for multilocus 
models, as in Nicolae (1999). 

3. HAPLOTYPE-BASED ASSOCIATION 
STUDIES 

3.1 Basics of Association Studies 

Genetic association studies are designed to study 
potential associations between genetic variants and 
phenotypes (i.e., observable traits) on a population 
scale. The association between the genotype at a 
given marker and a disease can appear because the 
genetic variant may be a risk factor for the disease, 
or because the variant may be strongly correlated, 
called in linkage disequilibrium (LD) in the genetics 
literature, with a causal locus. The magnitude of the 
correlation depends on many factors including the 
distance between the markers and the population 
history. 

For the simplicity of description, we focus here on 
a simple and popular design, case-control studies, 
although most results and principles are applicable 
to other sampling designs including those that incor- 
porate quantitative traits and family-based controls. 
The simplest genetic variant and a commonly used 
genetic marker is a single nucleotide polymorphism 
(SNP) that takes on only two possible alleles. Denot- 
ing the two possible alleles as 1 and 2, there are three 
possible genotypes (1,1), (1,2) and (2,2). The data 
for a case-control study can then be summarized as 
a 2-by-3 table where the entries are counts of the 
three genotype categories for the cases and controls, 
respectively. The data can be further reduced to a 
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2-by-2 table, where the entries are counts of the al- 
leles, if a multiplicative model (Terwilliger and Ott 
(1992); Falk and Rubinstein (1987)) for allele-risk is 
assumed. Note that under common assumptions, for 
a person randomly selected from the population, the 
two alleles carried are in Hardy- Weinberg equilib- 
rium, that is, they are independent. This might not 
be true for an affected individual if the genotypes 
confer different risks, but it is true for the multi- 
plicative model. Since this model is true under the 
null hypothesis which assumes no difference between 
the two alleles, assuming the multiplicative model 
for the purpose of testing does not affect the valid- 
ity of the p-values. Obviously the power could be 
reduced if the specified model is different from the 
true alternative. 

When the causal locus genotypes are not part of 
the data, or when the LD between the markers is 
strong, it might be more efficient to use more than 
one marker simultaneously. Most of these multilo- 
cus approaches for fine-mapping of disease alleles 
are based on haplotypes (e.g., McPeek and Strahs 
(1999); Pritchard et al. (2000); Lam, Roeder and De- 
vhn (2000); Morris, Whittaker and Balding (2002); 
Zollner and Pritchard (2005)). Haplotype analyses 
can be used to investigate untyped genetic variation 
(Pe'er et al. (2006); Nicolae (2006a)), and can be 
used to explore which markers could be causal and 
which are unlikely to be so. A haplotype is a sequence 
of alleles along a chromosome, and hence each per- 
son has two haplotypes. The alleles appearing in a 
haplotype are said to be in phase. If the haplotypes 
are directly observed, then standard methods for 
analyzing contingency tables could be used to test 
various models (Gretarsdottir et al. (2003)). Possi- 
ble scenarios range from having a candidate at-risk 
haplotype to testing the full model (all the haplo- 
types have different risks) versus the null model (all 
the haplotypes have the same risk). 

3.2 Causes of Incomplete Information 

With a case-control study conducted with indi- 
vidual SNPs separately, the sufficient statistic is a 
2-by-2 table under the multiplicative model and a 
likelihood ratio test can be used to test the null 
hypothesis. A common cause of incomplete infor- 
mation is missing genotypes since yield is often less 
than perfect. The situation becomes more compli- 
cated when multiple SNPs are considered jointly. 
With two SNPs, both having alleles denoted with 1 



and 2, there are four possible haplotypes: 1-1 (char- 
acterized by allele 1 at both SNPs), 1-2, 2-1 and 2-2. 
One simple alternative hypothesis is that haplotype 
1-1 has risk that is different from the other three 
haplotypes which are assumed to have the same risk. 
It could be that we believe the two SNPs are func- 
tional and there is interaction between them that 
leads to increased disease risk for haplotype 1-1, but 
more common is the hypothesis that the putative, 
but unobserved, mutation occurred in the 1-1 back- 
ground and the association between the haplotype 
and the trait is a result of both being associated 
with the mutation. 

Under the multiplicative model, if haplotypes can 
be observed directly, then this problem can again be 
reduced to a 2-by-2 table of haplotype counts where 
the haplotypes 1-2, 2-1 and 2-2 are collapsed into 
one. However, for the commonly used technology, 
SNPs are genotyped separately. For an individual, 
apart from incomplete information due to missing 
the genotype for one of the SNPs, there is the issue 
of uncertainty in phase. Specifically, if the genotypes 
for the first and second SNP are (1,2) and (1,2) re- 
spectively, then the two haplotypes could be either 
(1-1,2-2) or (1-2,2-1), that is, the information on 
phase is missing. In general, there is incomplete in- 
formation on phase if two or more SNPs that make 
up the haplotype are heterozygous. In family-based 
association studies (e.g., Abecasis, Cardon and Cook- 
son (2000); Martin et al. (2000); Lange and Laird 
(2002a), 2002b), the data on relatives will provide 
additional information on phase but there will still 
be uncertainty in inferring the haplotypes. For SNPs 
that are close together physically, there exist typ- 
ing technologies that can determine the haplotypes 
directly, but they are usually much more expen- 
sive. Hence, from the design perspective, quanti- 
fying loss of information is relevant not only for 
power/sample-size calculations, but also for the choice 
of technology. 

3.3 Measuring Relative Information Via Test 
Statistics — a Two-Sample Example 

Apart from being relevant for experimental design 
and the interpretation of data, the amount of miss- 
ing information is also useful for understanding the 
behavior of certain testing procedures. While one 
obvious way to perform testing is to apply a likeli- 
hood ratio test based on actual likelihoods computed 
for the observed incomplete data under the null hy- 
pothesis and alternative hypothesis separately, soft- 
ware for such calculations which allows the user to 
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define models in a flexible manner is not readily 
available. However, available are methods and soft- 
ware based on the EM algorithm that can be applied 
to one sample to calculate maximum likelihood esti- 
mates of haplotype frequencies and expected haplo- 
type counts for individuals or groups assuming the 
maximum likelihood estimates are the true parame- 
ter values (Excoffier and Slatkin (1995); Hawley and 
Kidd (1995); Long, Williams and Urbanek (1995)). 
Other more sophisticated methods and software to 
predict haplotype phase and estimate counts also 
exist (e.g., Stephens, Smith and Donnelly (2001); 
Niu et al. (2002)). It is very tempting for the user 
to apply standard testing procedures, such as the 
likelihood ratio test, by simply treating these ex- 
pected/predicted counts as the actual observed counts 
Doing this is analogous to the example in Section 1.3, 
except here we are dealing with a two-sample prob- 
lem. 

Specifically, if the original EM computation is ap- 
plied to the cases and controls jointly as a single 
group (i.e., as under the null), but with the expec- 
tation counts tabulated for the individuals who are 
then separated into cases and controls, the test is 
conservative. If, however, the EM computation is 
applied to the cases and controls separately, then 
the result is anti-conservative. Moreover, the degree 
of conservativeness with the first procedure, in large 
samples, matches the degree of anti-conservativeness 
of the second procedure. To be more specific, con- 
sider the following simple example. Suppose the ob- 
served data consist of 250 patients and 250 con- 
trols, or 500 chromosomes each. For a SNP, the pa- 
tient counts are 300 allele 1 and 200 allele 2, and 
the control counts are 250 allele 1 and 250 allele 
2. Let a and u denote respectively the population 
frequency of allele 1 in cases and controls. Under 
the null, the maximum likelihood estimates are d = 
u = (300 + 250)/(500 + 500) = 0.55 and the maxi- 
mum likelihood estimates under the alternative are 
a = 300/500 = 0.6 and u = 250/500 = 0.5. Simple 
calculations show that the log-likelihood ratio 
statistic is 

2[(.{a,u) - l{d,u)] = 10.12. 

Now suppose there are another 250 cases and 250 
controls each with no data yet. Suppose we treat 
these as missing data and apply the EM compu- 
tation to the cases and controls jointly. Since a = 
u = 0.55, these extra cases and controls each have 



expected counts of 275 allele 1 and 225 allele 2. To- 
gether with the original counts, this gives 575 allele 
1 and 425 allele 2 for the cases, and 525 allele 1 and 
475 allele 2 for the controls. The log-likelihood ratio 
statistic computed based on these counts is 5.05, 
approximately one-half of 10.12. 

By contrast, suppose the expected counts for the 
missing data are computed for the cases and con- 
trols separately. In this case, the presumed counts 
are simply twice the original counts: 600 allele 1 and 
400 allele 2 for the cases, and 500 allele 1 and 500 
allele 2 for the controls. The log-likelihood ratio 
statistic computed from these counts is 20.24, or ex- 
actly double that of 10.12. While this example is 
extremely simple and unrealistic, the phenomenon 
seen does extend to real data with haplotypes. In- 
deed, this is just another example of the relation- 
ships given in (4). That is, either ratio will correctly 
estimate that the relative information is about 50%. 
The theoretical results in the next section provide a 
general framework for such estimation. 

4. A LARGE-SAMPLE FRAMEWORK 
4.1 Variations on the EM Identity 

Our large-sample framework is built upon a sim- 
ple identity involving expected log-likelihood ratios, 
where the expectation is with respect to the condi- 
tional distribution of the missing data given the ob- 
served data. Expected lod scores have also been used 
in the genetics literature to measure the informa- 
tion content of the data (Ott (2001)), and to inves- 
tigate optimality and validity of analytic strategies 
(e.g., Cleves and Elston (1997); Abreu, Greenberg 
and Hodge (1999); Daw, Thompson and Wijsman 
(2000)). Note that lod stands for logarithm (usually 
base 10) of the odds, and is used as a statistic for 
testing whether two loci are linked. 

Specifically, let Y^o be the complete data and 
be the observed data — note that here Y^h is a func- 
tion of Yco- Let l{e\D) be the log-hkelihood of 9 
given data D. Then for any 6i and 

m\yco) - m\Yco) 
= [m\yo^.)-^{e2\yo^)] 

(12) 

+ [iog/(yco|i;b,^i) 
-iog/(yco|>;b,^2)]. 

By taking conditional expectation with respect to 
f{Yco\Yoh,0), where 6 is to be chosen, we have 

Y.[\od{9i,e2\Y,o)\Y,^A 
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(13) 



lod(01,02|>;b) 

f{Yco\Yo^,0i) 



+ E 



log 



f{Yco\Yob,02) 



Yob, 9 



where lod{6i,92\D) is the log of odds of 9i over 6*2 
given data D. Here log can be of any base, and lod is 
the log of the likelihood ratio, or more generally the 
log of posterior ratios. Identity (13) is a simple ex- 
tension of the key identity given in Dempster, Laird 
and Rubin (1977) for the EM algorithm. Specifically, 
using the notation of Dempster, Laird and Rubin 
(1977) 



(14) 



Q{9\e') = E[£{e\Y,o)\Yoi„9'] and 
H{9\9') = E[logf(Y,o\Yok,9)\Yob,9'] 



identity (13) is the same as 



(15) 



Qi9i\e) - Q{92\9) 

= ioh{9i) - U{92) + H{9i\9) - H{92\9), 



where ^ob(^) = ^(^l^ob)- In Dempster, Laird and Ru- 
bin (1977), (15) was given with 9 = 92, and was the 
basis for establishing the celebrated monotone con- 
vergence property of the EM algorithm. As we shall 
see, this intrinsic connection with the EM algorithm 
not only helps greatly our theoretical development 
in Section 6, but more importantly it enables us 
to compute our information measures directly from 
quantities that are already used for the EM compu- 
tation. 

Intuitively, if 9i is the truth, then if we had more 
data, which would come from f{Yco\Yoh,9i), we would 
on average have a larger lod score than lod(0i, 02|5^ob)- 
Indeed, by taking 9 = 9i in (13) we see 

E[lod(^l,02|>^co)|i;b,^l] 

(16) =lod{9i,92\Yo^) + KL{9i:92) 

>lod{9i,92\Yoi,), 

where KL(9i : ^2) > is the Kullback-Leibler informa- 
tion — in favor of 61 against ^2 when 9i is true — 
contained in the conditional distribution of Yco given 
Yoh- The inequality in (16) becomes equality if and 
only if KL(0i 1^2) = 0, which happens if and only 
if /(yco|^ob,6'i) = fiYco\Yoh,92) (a.s.); that is, given 
y^b, the additional data would contain no informa- 
tion to discriminate ^2 from 9i . The Kullback-Leibler 
distance has been used extensively in information 
theory (e.g.. Cover and Thomas (1991)) and math- 
ematical statistics (e.g., Aitchison (1975)). Recent 



work on using K-L loss includes George, Feng and 
Xu (2006) and references therein. 

Similarly, if 92 is the truth, then on average we 
would expect a smaller lod(0i,02|^co) if we had ob- 
served 1^0 • Mathematically, this is shown by taking 
9 = 92 in (13), which leads to 

E[\od{9i,92\Y,o)\YoM 
(17) =lod(^i,^2|^ob)-KL(02:ei) 
<\od{9i,92\Yo^), 

and the inequality becomes equality if and only if, 
as before, /(ycol^ob, 6*1) = /(^col^^ob, 6*2)- 

It is important to emphasize that because all the 
expectations above are conditional upon l^b; it is 
legitimate to allow any of the 0's to depend on l^b- 
In particular, the null value 6*0 in the rest of this 
paper can be either a known fixed value when Hq 
is a sharp null, or more generally the constrained 
MLE of 9 from £{9\Yoh) under the null. It is also 
important to emphasize that although in this sec- 
tion we focus on large-sample measures primarily 
because of their reliance on maximum likelihood es- 
timators (MLEs), as discussed below, all the equal- 
ities and inequalities discussed above do not involve 
any approximation, large sample or not. Therefore 
all measures discussed below can also be very use- 
ful for small samples, as long as the MLEs can be 
trusted (e.g., a small-sample MLE can have good 
properties, such as under the normal models). 

4.2 A Large-Sample Measure of Relative 
Information Against Hq 

Suppose the null value is 9q and that the MLE of 
9 (under Hi) given Y^b is 9oh, and lod(6'ob, 6'o|^ob)(> 
0) is used to assess the evidence against Hq:9 = 9q. 
To avoid technical complexity that is not of gen- 
eral interest for our proposals, we will assume (I) 
9oh is unique, an assumption typically automatic 
with large samples, and (II) ^ob / ^O) an assumption 
rarely, if ever, violated in practice. (Nevertheless, for 
theoretical completeness, we will consider the case 
of ^ob = ^0 in Section 6 via a limiting argument.) 
Then, if we intend to measure the information in 
the unobserved data for discrediting Hq, under the 
large-sample assumption, a natural thing to do is to 
treat ^ob as the "truth," and measure the expected 
loss of lod in favor of ^ob relative to the expected 
complete-data lod score. Namely, we can naturally 
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define 



(18) 



7^/^ 



lod(0ob,0o|i;b) 



E[lod(0ob,^o|>;o)|i;b,^ob] 
^ob(^ob) - ^ob(^'o) 



Q(^'obl^ob) - Q{%\(^oh) 

The last expression shows that the computation of 
TZIi only requires evaluations, at 6* = 6*0 and 9 = 6 oh, 
of the observed-data log-likelihood ^ob(^) and the Q 
function, where the latter is readily available from 
the EM algorithm. 

Under assumptions (I) and (II), TZIi is well 
defined and by (16), < TZIi < 1. It is 1 if and only if 
KL(^ob : ^o) = 0, which means that the missing data 
cannot distinguish between ^ob and and thus there 
is no missing information given Y^b- It approaches 
if and only if lod(0ob, ^o|^ob)/KL(6'ob : Oo) — ^ 0, which 
makes sense because if the observed-data likelihood 
has diminishing ability, relative to that of the missing- 
data model [as measured by KL(0ob ^^'o)], to distin- 
guish between Boh and then as far as providing 
evidence against Hq, the missing information ap- 
proaches 100%. One very appealing feature of TZIi 
is its direct interpretability. As seen in the haplo- 
type example in Section 3.3, a value of TZIi = 0.5 
implies that if we had the complete data, the lod 
score would be expected to be twice {TZI^^ = 2) as 
large. 

When i{9\Yco) is linear in a (multidimensional) 
summary statistics (i.e., a complete-data sufficient 
statistics) S{Yco), as when the complete-data model 
is from an exponential family, lod(^ob5 ^ol^co) can be 
written as lod(0ob5 ^ol'S'(^co)) and 

E[lod(0ob, 0o\Yco)\Yob, Oab] = lod(^ob, Oo\S*{Yab)), 

where 5* (Yob) = E[S{Yco)\Yab, ^ob]- That is, TZh mea- 
sures the anti-conservativeness of the completed-data 
test by pretending that the actual value of the un- 
observed S{Yco) is the same as its imputation under 
the (estimated) alternative. Therefore, TZIi is the 
general version of the first case in (4). 

This measure also has the following property when 

combining data sets. Suppose Yco = {Yco^ , . . . , Yco '^ } 
are mutually independent and we define TZIi for each 
Yco as in (18) but using ^ob instead of individual O^^^ 

(i.e., an MLE based on Y^^); then the overall TZI is 
a weighted harmonic mean of TZIi 's weighted by the 

}^), namely, 



individual lod score, lodj = lod(6'ob5 ^ol^c 

^1 E^=llod,7^A-^ 



(19) 



However, the individual lod score, lodj, is not nec- 
essarily always positive in practice, a problem that 
is closely related to the problem of defining rela- 
tive measures for small data sets (e.g., for individ- 
ual family), as discussed in Section 5. Note that TZIi 
can also be expressed as weighted arithmetic mean 
of TZIi^i if we choose the weights to be proportional 
to the expected individual complete-data lod score 



lod^^ = E[lod{eoh,eo\Y^o')\Y^ 



'ob J 



(20) 



TZh 



El^lod^TZh 



Clearly (19) and (20) are equivalent, as long as 
TZIi^i > 0. The harmonic rule (19) is somewhat more 
appealing because of the direct interpretation of the 
weight lodj. 

4.3 A Large-Sample Measure of Relative 
Information Under Hq 

Inequality (17) also suggests a large-sample mea- 
sure of the relative information under Hq. By taking 
01 = 9 and O2 = Oq in (17) we obtain that 

E[lod(^,^o|^co)|l^ob,eo] 
(21) =lod{e,9o\Yoh)-KL{9o:e) 

<lod{9,eo\Yoh). 

Thus, when the additional data are from f{Yco\Yoh, Oq) 
the expected complete lod score cannot exceed the 
one based on the observed data, for any 6. We can 
use max£iE[lod(0,0o|^co)|^b)^o]) which cannot ex- 
ceed lod(0ob5 ^ol^ob) by (21), as our best estimate 
of the complete-data lod score; the use of a single 
point estimate of the complete-data lod score with- 
out considering its uncertainty can be justified under 
the large-sample assumption. Consequently, we can 
define 

max,E[lod(e,eo|l^co)|i;b,^o] 



TZIn 



(22) 



iod(^ob,^o|i;b) 



TZI 



U{eoh)-LM ■ 

The last expression shows again the computational 
efficiency of this measure because maxg Q (6*1^0) is 
the same as carrying out the E-step and M-step of an 
EM algorithm, by pretending the previous iterated 
value is 9 = 0q. However, we emphasize that the use 
of maxg E[lod(6', 0o|^co)|^ob) ^0] in our definition of 
TZIq instead of E[maxe lod(0, 0o|^co)|^b5 ^0] is not 
because this computation is easy, but rather because 
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of the nature of the fundamental identity (13), which 
requires we maximize the expected complete-data 
lod score. 

Like TZIi, < TZIq < 1. Unlike TZIi, however, the 
investigation of when TZIq approaches one or zero 
is a more complicated matter, especially when the 
difference between 6oh and is large. This is a par- 
tial reflection of the fact that TZIq is defined un- 
der the assumption that the null hypothesis is (ap- 
proximately) valid, which would be contradicted by 
a large value oi 6 = 6 oh — Go, especially under our 
large-sample assumption. Therefore, it is more sen- 
sible to investigate its theoretical properties when 

5 is small, in which case it is essentially equivalent 
to TZIi, as we will establish in Section 6. Neverthe- 
less, it is useful to remark here that under the addi- 
tional assumption that ^ob is the unique stationary 
point of ^ob(^)) the numerator of TZIq is zero if and 
only if its denominator is zero, that is, if and only 
if 4b (^ob) = LhiOo). [The "if" part of this result is 
a trivial consequence of (21). The "only if" part fol- 
lows from the fact that if the numerator is zero, then 
^0 is a maximizer of Q{6\9q), which means that 
must also be a stationary point of ioh{G) by (56) in 
Appendix A. 2.] This demonstrates that in order for 
TZIq to be very small, the observed-data likelihood 
must suffer a diminishing ability to distinguish be- 
tween ^ob and 6*0, just as with TZIi. 

Also as with TZIi, when i{6\Yco) is linear in S{Yco), 
TZIq can be computed simply as 

^ maxg lod(6',go|5'o(yob)) 

° lod(^ob,^o|i;b) ' 

where 5Q(yob) = E('S'(yco)|^b) ^o)) that is, the mean 
imputation of the missing S{Yco) under the null. 
Therefore, TZIq is the general version of the second 
case in (4), and it measures the conservativeness of 
our test when we impute under the null. Its main 
disadvantage, as previously mentioned, is that it can 
provide very misleading information when the true 

6 is far away from the null. On the other hand, be- 
cause it is computed at the null, it is less sensitive, 
compared to TZIi, to possible misspecification of the 
alternative model. We will illustrate this in Section 
6.3, where we will discuss further the pros and cons 
of TZIq. 

4.4 Illustration With a Linkage Analysis 

In the context of allele-sharing methods, the mea- 
sures we introduced in the previous sections are im- 
plemented in the software ALLEGRO (Gudbjarts- 
son et al. (2000)), and are discussed in detail in Nico- 
lae and Kong (2004). In Figure 2, TZh and TZIq are 



plotted for various locations along chromosome 22 
(the unit for the X-axis is CentiMorgans) in a data 
set consisting of 127 pedigrees used in an inflamma- 
tory bowel disease study (Cho et al. (1998)). It can 
be seen that, in this case, the two measures are very 
close across the entire chromosome. This happens 
because the sample size is large and the distribu- 
tion of the family sharing scores is fairly symmetric. 
Also plotted is an inheritance-vector-based infor- 
mation measure calculated by the software GENE- 
HUNTER (Kruglyak et al. (1996)). This measure 
takes advantage of the fact that the inheritance vec- 
tors are equally likely under Hq and that, for the 
fixed support of the space of the inheritance vectors, 
the Shannon entropy (1949) is maximal for the uni- 
form distribution on the support. For the ith pedi- 
gree in the study and a given position t, it is defined 
as 

Eh 

_^ - Yluj, -P(wi|data, Hq) loga P(a;i|data, Hq) 
-j:^^P{uJi\Ho)log2P{ui\Ho) ' 

where coi was defined in Section 2.1. The definition 
of the overall information content of a data set is 
based on the global entropy, which, summed over 
all n pedigrees, satisfies 



cc 




(5 ■ 

10 20 30 «) 50 

Fig. 2. The large-sample measures of information are plot- 
ted against the genetic distance. The top two curves (almost 
identical) correspond to TZIi and TZIq; the bottom curve (dot- 
dashed) corresponds to the entropy-based measure (Kruglyak 
et al. (1996)). 
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While £r has several desired properties (e.g., it is 
always between zero and one, and it is one when 
there is perfect data on the inheritance vectors), it 
has some deficiencies that make it unsuitable for the 
linkage application. The most fundamental problem 
is that it measures the relative information in the 
whole inheritance vector space, which could be very 
different from what is available for a particular test 
statistic that is a function of the inheritance vec- 
tors. For example, in the right diagram of Figure 1, 
we may be nearly certain, and hence suffer very lit- 
tle missing information, that the IBS sharing is ac- 
tually IBD if we have the knowledge that the al- 
lele "A2" has very low population frequency, even 
though the parental alleles are unknown and there- 
fore En is low (see Nicolae and Kong (2004), for more 
details). It is also possible that is higher than the 
measures described in this paper (e.g., Thalamuthu 
et al. (2005)), for example in situations where there 
is a lot of data on unaffected individuals in a fam- 
ily, but little or no data on affected individuals. In 
these cases, will capture available information 
that is not directly of interest when we are perform- 
ing affecteds-only analyses. 

The serious overestimation or underestimation of 
relative information can have a great impact on the 
design of follow-up studies. One can decide on in- 
creasing the marker density if the relative informa- 
tion is low, as opposed to increasing the sample 
size. Both strategies are expensive, and therefore de- 
ciding what is the most efficient design is of great 
importance in practice. For example, at the global 
mode in Figure 2, our measures indicate that we 
have about 90% relative information, implying that 
potentially we can increase the lod score by only 
about 11% (1/0.9 = 1.11) if we add markers to make 
the IBD process approximately known (assuming 
the value of 6oh remains approximately the same 
with the additional data). On the other hand, the 
entropy-based measure from GENEHUNTER indi- 
cates that we have about 70% information, suggest- 
ing that a more substantial gain (over 40%) is possi- 
ble by increasing the density of the markers. There- 
fore these two approaches of measuring information 
are likely to lead to different strategies in allocating 
the resources, but evidently, in this example, it is 
unlikely the test results will change significantly by 
adding more markers near the location at the global 
mode. 



4.5 Illustration With a Haplotype-Based Study 

In Grant et al. (2006), the gene TCF7L2 was 
found to be associated with type-2 diabetes. In par- 
ticular, ahele T of rs7903146 (SNP402) and allele 
X of a microsatellite marker DG10S478 are both as- 
sociated with elevated risk of type-2 diabetes {p- 
value < 10"^°). Allele T and allele X are substan- 
tially correlated (r ~ 0.85) and their effects could 
not be clearly distinguished from each other in the 
original study. However, with additional data (Hel- 
gason et al. (2007)), it became clear that allele T is 
more strongly associated with diabetes than allele 
X. SNP402 has aheles T and C, and DG10S478 has 
alleles X and 0. Jointly there are four haplotypes: 
TX, CX, TO and CO. Figure 3 presents pairwise 
comparisons of these four haplotypes. Data are from 
1021 patients {n = 2042 chromosomes) and 4273 con- 
trols (m = 8546 chromosomes). Consistent with the 
single marker associations, haplotype TX is found 
to have elevated risk relative to CO. To distinguish 
between the effects of alleles T and X, haplotype 
TO is found to confer risk that is similar to that of 
TX and has significantly higher risk than CO. By 
contrast, haplotype CX is found to have risk similar 
to that of CO and significantly lower risk than TX. 
In other words, given SNP402, DG10S478 does not 
appear to provide extra information about diabetes, 
which keeps SNP402 as a strong candidate for being 
the functional variant. 

The yield of the genotypes is not perfect. Each 
subject has genotypes for at least one of the two 
markers, but about 3.5% of the genotypes are miss- 
ing. This together with uncertainty in phase leads to 
the incomplete information summarized in Figure 3. 
Interestingly, while the same data are used for the 
six pairwise comparisons, the fraction of missing in- 
formation can be quite different. Most striking is 
that the relative information for the test of TX ver- 
sus CO is very close to 100%, while the other tests 
all have more substantial missing information. We 
explore some of the reasons below. 

Notice that T is highly correlated with X and 
C highly correlated with 0. As a consequence, TX 
and CO are much more common than TO and CX. 
Consider a subject whose genotype for D10GS478 
is missing. Here we can think of his two alleles for 
SNP402 one at a time. Given an observed allele T, it 
is clear that the haplotype is not CO and quite likely 
to be TX. Hence, even though incomplete, there is 
still substantial information provided for the test 
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Fig. 3. For each haplotype, estimated frequencies in patients and controls are displayed. RR is estimated risk of the 
haplotype the arrow is pointing to (hi ) relative to the haplotype the arrow is pointing from (7i2 ), and is calculated as 
[n{hi)/m{hi)]/[n{h2) /m{h2)] where n and m are estimated haplotype counts in patients and controls respectively. P val- 
ues are calculated based on a likelihood ratio test that properly takes missing information into account. Information shown is 
TZIi . Very similar numbers are obtained for IZIo . 



of TX versus CO. By contrast, we know that this 
chromosome is useful for the test of TX against TO, 
but with the allele of DG10S478 missing, that in- 
formation is completely lost. Even more interesting 
is that, if the observed allele is C instead, then this 
haplotype is completely uninformative for the test of 
TX versus TO, that is, there is actually no informa- 
tion here whether or not we know the corresponding 
DG10S478 allele. In effect, the genotype of SNP402 
is an ancillary statistic for the test of TX against TO 
(or CX against CO). It tells us how much informa- 
tion we can get from this individual assuming that 
we have no missing data, but by itself does not pro- 
vide any information for the test. Moreover, if the 
test of TX versus TO is of key interest, then effort 
to fill up missing genotypes for DG10S478 should be 
focused on those individuals that are T/T homozy- 
gous for SNP402. 

When genotypes of both markers are observed, 
uncertainty in phase only exists for those individuals 
that are doubly heterozygous, that is, having geno- 
types C/T and 0/X. Such an individual either has 
haplotypes CO/TX (scenario I) or CX/TO (scenario 
II). Scenario II provides no information for the test 
of TX versus CO. Scenario I does contribute some- 
thing to the test, but by providing a count of 1 to 
both TX and CO, its impact on the test of TX versus 
CO is rather limited. By contrast, for the test of TX 
versus TO, scenario I adds a count of 1 to TX while 
scenario II adds a count to TO. Hence, uncertainty 
in phase has a much bigger impact on the test of TX 



versus TO than the test of TX versus CO. This ex- 
ample, therefore, illustrates clearly the importance 
of measuring test-specific relative information. 

5. SMALL-SAMPLE EXPLORATORY 
MEASURES 

5.1 A Bayesian Framework 

The measures defined in previous sections do not 
necessarily work with small samples (e.g., data for 
one family) because they rely on the ability of the 
MLE to summarize the whole likelihood function. 
The Bayesian approach becomes a valuable tool in 
such settings even if we do not necessarily have a 
reliable prior; we can first construct a coherent mea- 
sure and then investigate the choice of prior. Since 
a likelihood quantifies the information in the data 
through its ability of distinguishing different values 
of the parameter, it is natural to consider measur- 
ing the relative information by comparing how the 
observed-data likelihood deviates from "flatness" rel- 
ative to the same deviation in the complete-data 
likelihood. The Bayesian method is ideal here be- 
cause we need to assess the change in this deviation 
due to the joint variability in the missing data and in 
the parameter. A reasonable measure of this devia- 
tion, conditioning on l^b) is the posterior variance of 
the likelihood ratio (LR). This measure is appealing 
because it is naturally scaled via the equality 

(23) ui{eQ,e\Y,^) = ^ui{eo,e\Yco)\Yo^,,el 
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which guarantees that 



(24) < BIl 



Var[LR(go,g|yob)|i;b] 
\^I[Ui{e^,e\Y,o)\Yo^^ 



< 1, 



where vr indexes the underlying prior on 9 used by (24) 
and BI stands for "Bayes Information." We assume 
here that the complete-data likelihood surface is not 
flat, as otherwise the model is of little interest. The 
denominator in (24) is therefore positive. We also 
need to assume that the posterior variances of the 
two likelihood ratios are finite. This second assump- 
tion can be violated in practice, but a second mea- 
sure we will propose below essentially circumvents 
this problem. 

In the presence of nuisance parameters (under the 
null), there is also a subtle issue regarding the nui- 
sance part of in the definition of BI^ . For a full 
Bayesian calculation, one should leave it unspecified 
and average it over in the posterior calculation, just 
as with the 6 in LR(0O)^)- On the other hand, to be 
consistent with the large-sample measures as defined 
in Section 4, we can fix the nuisance parameter part 
in ^0 by its observed-data MLE under the null. Iden- 
tity (23) still holds with such a "fix," because the 
calculation there conditions on the observed data. 
This "fix" may seem to be rather ad hoc from a pure 
Bayesian point of view. However, it can be viewed as 
an attempt in capturing the dependence (if any) be- 
tween the parameter of interest and the nuisance pa- 
rameter under the null, a dependence that is ignored 
by a single prior on the nuisance parameter regard- 
less of the null. This subtle issue is related to the dif- 
ference between "estimation prior" and "hypothesis 
testing prior," an issue we will explore in subsequent 
work. Here we just note that all the Bayesian mea- 
sures defined in this section can be constructed with 
either approach for handling the nuisance parameter 
under the null, although those under shrinking prior 
toward the null (see Section 5.2) are most easily ob- 
tained when the nuisance parameter under the null 
is fixed at its MLE (or some other known values). 

With either approach, 

BI^ = 1 if and only if 

E{Var[LR(0o, 0\Yco)\Yob,e]\Yoy,} = 0, 

which occurs if and only if for almost all the 9 in 
the support of the posterior, the complete-data like- 
lihood LR(0O)^|^co) is (almost surely) a constant as 
a function of the missing data, and thus the missing 
data would offer no additional help in distinguishing 



9 from ^0- On the other hand, BIf = if and only if 
the observed-data likelihood ratio is a constant, and 
thus there is no information in the observed data for 
testing Hq using LR(^05 ^l^ob)- Other characteristics 
of this measure depend on the choice of the prior vr, 
and they will be discussed in the following sections. 

One potential drawback of BIi is that it can be 
greatly affected by the large variability in the likeli- 
hood ratios, as functions of the parameters, for ex- 
ample, when very unlikely parameter values were 
given nontrivial prior mass. This problem can be 
circumvented to a large extent by using the poste- 
rior variance of the log-likelihood ratio, 

\av[\od{9,9o\Y,^)\Y,^]. 

The use of the log scale also makes it much more 
likely, compared to the ratio scale, that the result- 
ing posterior variances are finite. Evidently, just as 
with the posterior variance of the likelihood ratio, 
this is equal to zero if and only if the observed-data 
likelihood ratio is a constant (on the support of the 
posterior). Similarly, 

P{Y,a\Yay„9) 



Var 



log- 



Yr 



ob 



P{Yco\Yob,9o) 

is equal to zero if and only if there is no additional in- 
formation in the missing data for testing Hq. These 
suggest that we can also measure the relative infor- 
mation by 



(25) 



i3/J = Var[lod(0,0o|>;b)|>;b] 



Var[lod(^,^o|>"ob)|^ob] 



+ Var 



log 



P{Y,o\Y,^,9) 



PiYco\Y,^,9o 



Y 



oh 



where, as for BIf, vr indexes the underlying prior 



on 



Although the use of lod is more natural in view 
of the large-sample measures given in Section 4, it 
does not admit the nice "coherence" identity for the 
likelihood ratio as given in (23). Indeed, we had to 
remove ad hoc a cross term in the denominator of 
(25) in order to keep the resulting ratio always in- 
side the unit interval. Furthermore, as we show in 
Section 6, the use of the ratio scale, instead of log 
ratio, leads to a number of interesting identities be- 
tween likelihood ratios and Bayes factors, and it is 
more connected with some finite-sample measure of 
information in the literature. Whereas such trade- 
offs need to be explored, our general results in the 
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next section imply that in the neighborhood of 
the differences between these two measures should 
be small. 

5.2 Limits Under a Shrinking Prior Toward Null 

Given their definitions, the immediate question is 
how to choose vr and how to compute BI^ and BI2 
efficiently since, in general, their calculations require 
integrations that cannot be performed analytically. 
When the truth is believed to be in a neighborhood 
of the null value Oq , a ^o-neighbor approximation to 
Blf and BI2 can be obtained by choosing vr to be 
U{6o — 5,Bo + S) with d > small. It is proved in 
Appendix A.l that the two Bayesian measures have 
the same limit as 5 — )• 0, denoted by BIq, 



ID 



BIo 



(26) 



(00 1 yob ) + Var (5(^0 1 ^00) I ^ob , ^o) 



where S'(^lyob) and ^(^lyco) are respectively the 
observed-data and complete-data score function, and 
-^mi(^l^b) is the expected (missing) Fisher informa- 
tion from f{Yco\Yo]j,6). Note that although this re- 
sult obviously assumes 9 is univariate, it can also be 
applied when only the parameter of interest is uni- 
variate, if we fix the nuisance parameter part in 9q 
at its observed-data MLE under the null. 

For the exponential tilting linkage model, one can 
verify that 

BIo 

(27) 



W^2 + Var(Z|data,i/o) 
Var(Z|data, Hq) 



1 



W^ + Yav{Z\data,Ho 



LO 

.9 d 
^d 

™_ 

O 

~ o 

o 
d 



where W = E{Z\data, Hq), and Z is given in (5). 
Therefore its computation is straightforward because 
it only depends on the test statistic and the null hy- 
pothesis. Note also that the expectation of the de- 
nominator in (27) under the null is simply Var(Z|ffo) = 
1. Therefore, if we replace the denominator in (27) 
by its expected value under the null, we obtain an 
even simpler approximation BIq ~ 1 — Var(Z|data, Hq) 

However, BIq measures only the relative informa- 
tion in the neighborhood of ^o- For example, sup- 
pose the data consist of one affected sib-pair like in 
Figure 1 such that both parents and the sibs are 
heterozygous with the same pair of alleles at a spe- 
cific locus (i.e., all individuals have the alleles "Al" 



-10 12 
parameter value 

Fig. 4. Log-likelihood ratio for a sib-pair where the parents 
and sibs are IBS for a heterozygous genotype. 

and "A2"). In this case, the observed-data likelihood 
from the exponential tilting model is very informa- 
tive away from (see Figure 4), but BIq = be- 
cause the null value ^0 = turns out to be the min- 
imizer of the observed-data likelihood. 

In general, whenever ^0 is a stationary point of 
^(^l^ob), BIq = 0, even if there is almost perfect in- 
formation. For example, if the data consist of 2n + 1 
sib-pairs such that there is complete information on 
2n sib-pairs, n sharing alleles IBD and n sharing 
2 alleles IBD, and one sib-pair has no information, 
then W = and thus BIq = 0. This is clearly a mis- 
leading measure. In the next section we propose a 
remedy for this problem. 

5.3 Combining Individual Information Measures 

The measures defined in Section 5.1 are inherently 
small-sample quantities, for the variance terms used 
in these measures do not naturally admit additiv- 
ity even for i.i.d. data structures. Whether one can 
find a satisfying small-sample measure that would 
automatically admit such additivity is a topic of 
both theoretical and practical interest, but for our 
current purposes we can impose such additivity by 
defining global measures via appropriate combin- 
ing rules, such as (19). We adopt such rules mainly 
to maintain the continuity of moving from small- 
_sample to large-sample measures as proposed in Sec- 
tion 4. Whether these are the most sensible rules is 
a topic that requires further research. 

Specifically, suppose our data consist of n inde- 
pendent "small units" (e.g., individual families) , Y^^ . 
We apply (24) to each unit and then combine them 
via the harmonic rule (19) but with weights pro- 
portional to Vi = Yai[LR{9o, 9\Y^I^)\yX^]. In other 
words, we define the measure for the aggregated 
data by first summing up both the numerators and 
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denominators of individual BI^^ and then taking the 
ratio. That is, 



BIJ 



Er=iVar[LR(0o,^|>;^bOI>; 



ob 



Er=iVar[LR(0o,^|l^io^)lnb^^ 



(28) 



n 1-1 



Similarly, we can define the combined version for 
BI2 from individual Bl2i, and we can also use the 
arithmetic combining rule (20). In addition, its limit 
under the shrinking prior, in analogy to (26), can be 
expressed as 



BL 



ob J 



Er=iS2(^oii;?)+Er=i^mi(^oinb) 



(29) 



where /mi(^|^b) is the expected Fisher informa- 
tion matrix from f{Yco\Yoh, 0), with Yob = {Y^h ^ ■ ■ ■ 1 
^^b^}- We have changed the notation from BIq to 
Bis to signify the fact that the latter measure is de- 
fined by summing up the numerators and denomina- 
tors of the individual BIq^s separately before form- 
ing the combined ratio. The second equation in (29) 
holds because of the additivity of Fisher information 
for independent data structures. For the exponential 
tilting linkage model, this averaging for a shrinking 
prior leads to 



BL 



E"=i + Eti Var(Z,|data, Ho 



EILi W^/'^ + Var(Z|data, Hq) ' 

where Wi = E{Zi\data, Hq) and Z = -^i/y^- 
This is equal to zero only if all the Wj's are equal 
to zero, as opposed to using a global posterior, that 
is, by applying (26) directly to the whole data set, 
where ^ Wj = is sufficient to cause BIq = 0. This 
difference is an important advantage for Bis, as we 
will demonstrate in Section 6.3. 

5.4 An Empirical Comparison 

To illustrate the proposed Bayesian measures of 
information, we calculated them for various priors 
vr in a data set containing 21 ulcerative colitis (UC) 
families (Cho et al. (1998)). The choices of priors 




Fig. 5. The Bayesian measures are calculated for a data 
set containing 21 families. The solid line is BI3; the dashed 
line corresponds to B/J calculated using a uniform prior on 
( — 1,1); the dot- dashed line corresponds to BI2 calculated us- 
ing a uniform prior on (min(Sob, ^o) — 0.1, max (Sob, So) +0.1); 
the dotted line corresponds to BI2 calculated using a uniform 
prior on (Sob — 0.1, Sob + 0.1) . 

here were made for investigating the sensitivity to 
prior specification, so they may not reflect our real 
knowledge about the problem (e.g., we generally ex- 
pect 6 to be nonnegative in such problems). In Fig- 
ure 5 the measure of information Big is plotted in 
comparison with BI2, which is calculated as de- 
scribed in the previous section for three different 
priors. Similar results are obtained using Blf. In 
this example TZIi and Bis are almost identical; TZIi 
is therefore not shown. Note that the value of the 
parameter under the null hypothesis of no linkage is 
equal to zero, and, for this data set, the maximum 
likelihood estimates for the linkage parameter across 
the chromosome vary between —0.74 and 0.07. 

We note that the BI2 measure calculated using 
a Uniform(— 1, 1) prior is very close to Bis, which 
demonstrates the possibility of having very differ- 
ent priors that result in very similar measures. The 
Bayesian measure calculated with a prior having a 
narrower support, that is, uniform on the interval 
(min(^obi^o) — 0.1, max(0obi ^0) + 0.1), follows the 
same patterns but is uniformly smaller. Using a prior 
centered around the maximum likelihood estimate, 
uniform on the interval (^ob — 0.1, ^ob + 0.1), turns 
out to be very misleading because it gives values 
that are considerably too small (i.e., in comparison 
with the large-sample estimates given in Figure 2). 
We emphasize that symmetric uniform priors were 
used in Figure 5 simply to demonstrate potential 
substantial sensitivity to prior specification, as one 
often expects less erratic behavior from such sym- 
metric and smooth prior specifications. The issue of 
sensitivity to the choice of prior is further discussed 
in Section 7. 
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6. THEORETICAL CONNECTIONS, 
COMPARISONS AND CURIOSITIES 

6.1 The Asymptotic Equivalence to the 
Estimation Measure 

As we discussed previously, a central difficulty in 
measuring the relative amount of information is that 
its value will generally depend on the true value of 
the unknown parameter. One way to explore this 
dependence is to replace ^ob in the definition of TZIq 
or TZIi by in a suitably defined neighborhood, and 
to plot it against in such a range to check its vari- 
ability. The use of this type of relative information 
function was proposed in Meng and van Dyk (1996) 
for the purpose of measuring the rate of convergence 
of EM-type algorithms, where the function 



(30) 



ni{9) 



was termed relative augmentation function. Note that 
TZIi is simply the value of this function at 8 = 6q. For 
simplicity of presentation, we will assume in the fol- 
lowing and Section 6.2 that 6 is univariate, though 
all the results are generalizable to multivariate 9 by 
employing appropriate matrix notation and opera- 
tions. We also assume all the regularity conditions as 
in Dempster, Laird and Rubin (1977) to guarantee 
the validity of taking differentiation under integra- 
tion and for Taylor expansions. 

It was shown in Meng and van Dyk (1996) that as 
^ ~^ ^obi 'R-IiG) approaches the so-called fraction of 
observed information for the purpose of estimation: 

^b 



(31) 



^co -'ob 



where the observed, complete and missing Fisher in- 
formation are defined, as in Dempster, Laird and 
Rubin (1977), 

dHog f{Y,^\9) 



(32) 



(33) 



and 



(34) 



'ob 



obl^ob 



5^2 



= -^mi(6'ob) 

dHog f{Y,,\Y, 



E 



ob ) 



5^2 



I CO — -^co(^ob) 

dHog f{Y,,\9) 



E 



5^2 



where the last identity is known as the "missing-data 
principle," and is a directed consequence of (15). 
The TZIe measure plays a key role in determining 
the rate of convergence of the EM algorithm and its 
various extensions (e.g., Dempster, Laird and Rubin 
(1977); Meng and Rubin (1991), 1993; Meng (1994); 
Meng and van Dyk (1997)). 

The above limiting result suggests that, when 6 = 
^0 — ^ob is small, we can study the behavior of IZIi 
via its connection to IZIe, as we demonstrate in the 
next section. However, among all the measures we 
proposed, the measure Big of (29) most closely re- 
sembles TZIe of (31). The main differences are the 

use of 5'^(0o|^ob^) i^^ place of /obi and the fact 
that the Fisher information terms in TZIe are evalu- 
ated aX 9 = 9oh , whereas for Bis they are evaluated 
at 9 = 9q. It is well known that, under regularity 
conditions, E"=i S^i0o\Y}S)/n will converge to the 
expected Fisher information under the null. Conse- 
quently, under the null. Bis and TZIe are asymptoti- 
cally equivalent. This equivalence may suggest to di- 
rectly define Bis in terms of the "observed Fisher in- 
formation at ^o-" However, although /ob = -^ob(^ob) 
is guaranteed to be nonnegative (definite) when ^ob 
is in the interior of the parameter space G, this is 
not necessarily true for /ob(^o)- Therefore, for small- 
sample problems for which the use of 7ob is inade- 
quate (e.g., when the MLE 6'ob is on the boundary 
of G), the direct substitution of Job by /ob(^o) will 
not lead, in general, to a nonnegative measure. The 
Bis measure circumvents this problem by using the 
sum of individual squared scores instead of /ob(^o)) 
which guarantees that the resulting measure is in- 
side the unit interval, and that it is consistent with 
TZIe for large samples. Therefore Bis can be viewed 
as a small-sample extension of TZIe in the neighbor- 
hood of the null. 

6.2 Finite-Sample Equivalence in the 
Neighborhood of the Null 

For both TZIi and TZIq, their equivalence to TZIe in 
the neighborhood of 9q can be established for finite- 
sample sizes. (Therefore, TZIe can also be defined 
as the value of either TZIi or TZIq when ^ob = ^o-) 
Specifically, denote the fcth derivative of ^ob(^) 
at 9 = 9oh, and 



-^ob ~l~ Imii 



(35) 



d'+^Q{9i\92) 
d9{ d9{ 



S\—d2—doh 
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It is proved in Appendix A. 2 that 



(36) nil = TZIe + 



3/„ 



^(3) 

^5 



In deriving this resuh, we have utihzed the follow- 
ing well-known identities in the literature of the EM 
algorithm (e.g., Dempster, Laird and Rubin, 1977; 
Meng and Rubin (1991)): 



(37) 



^ob 



^ob 



0; Qir 



Under the assumption that Q{6\9o) has a unique 
maximizer as a function of ^, an assumption that is 
easily satisfied in most of the applications when EM 
is useful, we also prove in Appendix A. 2 that 



nio = niE 



(38) 



+ (3iob(gr + Q^^^ 



'ob 



2i[ 



(3) 



Qib'°^^4)(3/co)"^5 



ob 



+ 0(6^). 



These expansions are useful for comparing the first- 
order (in 5) behavior of TZIi and TZIq. For example, 
we suspect that, for many applications, TZIq is a con- 
servative estimate of the actual relative information, 
where TZIi is a more accurate measure. One way to 
validate this or to identify situations where this con- 
jecture is true is to compare the two coefficients of 
5 and to determine the appropriate conditions for 
TZIq < TZIi to the first order in the neighborhood of 
^0 (away from the neighborhood the comparison is 
not very meaningful because TZIq can be seriously bi- 
ased). Due to the complex nature of these two coef- 
ficients, we only present in the next section a simple 
example to illustrate the conservatism of TZIi, and 
leave the general theoretical investigation to subse- 
quent work. 

We also remark here that when the true 6 is be- 
lieved to be close to 9o, a measure like TZIq can be 
used to construct reasonable bounds. For example, 
we can expect min{7^/o) ^-^i} to be a reasonable 
lower bound and max{TZIo,TZIi} an upper bound 
for relative information, or we can use TZIq,^ = 
\/TZIqTZIi as a compromise. In future work, we in- 
tend to investigate the reliability and applicability of 
such bounds and compromise. Here we simply note a 
computational advantage of TZIq,^ that follows from 



(39) TZIi 



0.5 



maxe[Q{e\eo) - Q{eo\eo)] 

Q(^obl^ob) — Q{%\(^oh) 



1/2 



which avoids entirely the calculation of the observed- 
data log-likelihood function iohiO), which is often 
harder to compute than the expected complete-data 
log-likelihood Q{6\9'). Furthermore, whenever TZIi 
and TZIq are close to each other, as in our real-data 
examples, TZIq,^ will be practically the same as ei- 
ther TZIi or TZIq. 

6.3 An Illustrative Finite-Sample Comparison 

Let Yco = {yi, . . . , Un} be i.i.d. samples from A^(/U, 
cr^), where both /i and cr^ are unknown, and the 
null hypothesis is Hq : fi = fiQ. Suppose our observed 
data l^b is a size-m random sample of Yco, where 
< m< n. Then it should be clear that the relative 
information is r = m/n by any reasonable argument. 
Indeed, straightforward calculation shows TZIi = r 
regardless of the actual value of l^b- However, 



TZIq 



(40) 



1 



r(l 



\og{l + il-r^){4/m)) 
log(l + {tl/m)) 



ni 



m 



where tQ = {ym — /^o)/\/^m7"^) which differs from 
the usual t-statistic (under the null) only due to the 
use of MLE for cj^ , (T^ = ( 1 — 1/m) , instead of the 
sample variance s^. From (40), it is clear that TZIq 
approaches r whenever tQ/m is small, which implies 
that TZIq will recover (reasonably) the correct infor- 
mation when the null hypothesis is (approximately) 
correct. 

In contrast, for a fixed sample size m, TZIq ap- 
proaches zero if — )• oo because for large Iq, TZIq be- 
haves like — r~^log(l — r^)/log(l -|- ^). The reason 
is that the larger is, the stronger is the evidence 
that the null is false, and thus the more conserva- 
tive we become when we impute lod(/i, /xqI^co) us- 
ing E[lod(/x, //o|^co)|^ob) fJ'o]- In other words, whereas 
TZIq is a good measure of how conservative the in- 
ference is, this example demonstrates that measur- 
ing conservatism in general is not necessarily the 
same as measuring the relative information. How- 
ever, when the true is in a reasonable neighbor- 
hood of T^Iq can be a valuable measure, espe- 
cially because it is more robust to the posited alter- 
native model and thus can serve as a useful diagnos- 
tic measure complementing TZIi. We also note the 
potentially different impacts of nuisance parameter 
on TZIq and TZIi. When cr^ is known, TZIq = TZIi = r. 
However, whereas TZIi remains the same when is 
unknown, TZIq is greatly affected. 
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It is also informative to see how BIq of (26) and 
Bis of (29) compare in this simple problem. For 
reasons discussed previously, we fix here the nui- 
sance parameter o"^ at its MLE under the null, a'^^ = 
Yl^iiUi ~ A*o)^/"^- We therefore effectively have a 
single-parameter whose score function given a 
normal sample {yi, . . .,ym} is SmifJ-) = m{ym - / cr'^ 
(where cr^ is treated as known). Using the fact that 
-^mi(^l^ob) = {n — m)/a'^, we have from (26), after 
setting = a^^, 

^ "^^(ym-/^o)Vc^ob 

(41) 

rtl 

rtl + {l-r){l + tl/m)' 

It should not be a surprise to see that BIq = when 
to = 0, that is, when fiQ happens to be the MLE 
of 0, Urn, a phenomenon we previously noted in Sec- 
tion 5.2. However, this simple example provides some 
clues on why this happens. 

Recall that BIq was derived by assuming that the 
prior shrinks to the null. This is very strong prior 
information, and it inevitably influences our mea- 
sure of the relative information. Consider the situ- 
ation when ^0 = 0, in which case our observed data 
are completely consistent with our strong prior that 
= Oq. In that sense, the information from the ob- 
served data is completely useless because it does not 
provide anything more than we a priori knew (or 
rather, assumed). Hence it is not a contradiction for 
BIq to declare zero relative information when clearly 
the relative information in the observed data should 
be r. It is not a contradiction because BIq has in- 
corporated the prior information, whereas r = m/n 
measures the relative information in the data under 
our posited model. This argument appears to be fur- 
ther substantiated when we consider the other ex- 
treme, namely, when tg — )■ oo. By the same logic, in 
this case, the observed data are extremely informa- 
tive as they provide strong evidence to contradict 
the prior, and the degree of contradiction is such 
that, even with more data, it is unlikely to be al- 
tered. Consequently, one can expect BIq to be close 
to 1, which indeed follows from (41) when m is large 
because BIq — t- [1 + {r~^ — l)m~^]~^ when — t- oo. 

The above discussion indicates a potential prob- 
lem with any Bayesian measure, as it is inevitable 
that some prior information will "leak" into our mea- 
sure of relative information in the data alone (for 



a specified test). When we have reliable prior in- 
formation, it is a very interesting issue to investi- 
gate/debate whether our relative information should 
include the prior information (e.g., in the extreme 
case when we know the null is true for certain, the 
data become irrelevant, and one can always con- 
sider we have 100% information). Nevertheless, in 
cases where the prior is introduced for convenience, 
as largely the case for our setting, it is desirable to 
reduce any unintended influence as much as possi- 
ble. In this regard, it was a pleasant surprise to see 
that the Big defined in (29) is able to recover the 
correct answer in this example. Specifically, letting 
(T^ = a^^, (29) becomes 

, , ' EIti(yi-^o)V^ob + ("-?")/'7ob 

42 

771 

m + (n — m) 

It is curious that Big has this ability of "removing" 
the impact of prior information that affected BIq 
in this finite-sample setting; how generally this re- 
sult holds (even approximately) is a topic for future 
research. 

6.4 Connections to the Two CR Information 
Lower Bounds 

Our large-sample measures have interesting con- 
nections with classic measures based on Fisher in- 
formation, as shown in Section 6.1. Are there simi- 
lar connections for the small-sample Bayesian mea- 
sures? The Bayesian measures are based on pos- 
terior variances of likelihood ratios or their loga- 
rithms. It turns out that there are several interest- 
ing connections, or at least analogies, in both fre- 
quentist and Bayesian literature. In a frequentist 
setting, just as the well-known Cramer-Rao lower 
bound provides a finite-sample information bound 
that is determined by the Fisher information, there 
is a more general Chapman-Robbins information 
bound (Chapman and Robbins (1951)) that is based 
on sampling variance of the likelihood ratio. Specif- 
ically, let X have a multivariate pdf/pmf f{X\6) 
with 9 taking values in some parameter space 0. 
For each 6, let Sg = {x: f{x\6) > 0} be the support 
of f{X\6). Suppose T{X) is an unbiased estimator 
of a real- valued function t{9). Let 

^-e = {0 G e : t(0) / T{e) and C Se}. 
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Then 



\aT{T{X)\e) > sup 



\ai{LR{(P,9\X)\e) 



where LR{(f),9\X) denotes the Hkehhood ratio func- 
tion f{X\ct>)/f{X\e). 

This "second CR" bound is more general than 
the first one because it requires neither differentia- 
biUty of t{9) nor the existence of Fisher informa- 
tion (e.g., as in the case of discrete parameters). 
It provides an interesting analogy to the proposed 
Bayesian measures because it is based also on the 
variability of the likelihood ratio, where (f) and 9 can 
be arbitrarily apart. The central connection here is 
that while our large-sample measures have close ties 
with Fisher information (as detailed in Section 6.1), 
which is also intimately connected with the "first 
CR" bound (i.e., Cramer-Rao bound), our small- 
sample measures are based on variances of likeli- 
hood ratio, which is connected with the "second 
CR" bound. The fact that the second CR bound 
is more general than the first CR bound is also 
consistent with our expectation that our Bayesian 
measures ultimately should be more general than 
the likelihood-based large-sample measures, though 
currently this is still just an expectation, not a real- 
ization. 

6.5 Connections Between Likelihood Ratio and 
Bayes Factors 

The variances in our Bayesian measures are more 
general than the one used by the second CR bound 
because we average over not only the missing data 
but also the posterior distribution of 9. Examining 
the posterior distribution of the entire likelihood ra- 
tio might seem a case of "using data twice," but 
the following several identities suggest that such a 
practice is natural from the Bayesian point of view 
(indeed, the use of posterior distribution of the likeli- 
hood ratio has been previously advocated by Demp- 
ster (1997)). 

First, suppose we have a proper prior ■k{9)] then 
it is easy to verify that 

E[LR(0o,^|i;b)|i;b] 



(43) 



fn{Y, 



d9 



oh) 



BF 



ob ) 



U{Y,h) 

where /^(l^b) = j f{Y^h\0)TT{e)d9. (Note that here 
we assume is fixed at a known value.) 



In other words, the posterior mean of our likeli- 
hood ratio is simply the well-known Bayes factor for 
assessing the probability of the model under 9 = 9o 
relative to the model under 9 ~ '7t{9). This shows 
that the Bayes factor is a very natural generaliza- 
tion of likelihood ratio by taking into account our 
uncertainty in 9 while accessing the evidence in the 
data against the hypothesized null value 9 = 9q. It 
also shows that it is quite natural to consider poste- 
rior quantification of the likelihood ratio itself. Inci- 
dentally, applying identity (43) first with l^ob = Y^o 
and then averaging the resulting identity over the 
posterior predictive distribution /(l^col^ob), we also 
obtain the following intriguing result: 



(44) 



E[BFeo|yob] = E[LR(^o,^|l^co)|i;b] 

= E[LR(^o,^|i;b)|i;b] = BFob. 



In other words, the observed-data Bayes factor BFob 
is the posterior average of any of these three quanti- 
ties: the observed-data likelihood ratio, the complete- 
data likelihood ratio, or the complete-data Bayes 
factor. Identities (23), (43) and (44) together demon- 
strate the "coherence" of likelihood ratio and Bayes 
factor as well as between them. Identity (44) also 
suggests an easy way of computing BFob via Monte 
Carlo averaging of complete-data or observed-data 
likelihood ratios. We note, however, that the 
posterior distributions of BFco, LR(^O)^l^o) and 
LR(0Oi^|^ob) are generally different. In particular, 
because of (23) and (43), we have that 



(45) 



max{ Var [BF^o | ^ob] , Var [LR(0o , ^ I >^ob) I l^b] } 



<Var[LR(0o,^|>^co)|>^c 



obj- 



Given the clear interpretation and utility of the 
posterior mean of the likelihood ratio, we would nat- 
urally consider the posterior variance of the likeli- 
hood ratio. That is, we can measure the posterior 
uncertainty in our likelihood ratio evidence. These 
are exactly the quantities used in defining BI^ in 
(24), where the numerator and denominator are re- 
spectively the posterior variances of the observed- 
data and complete-data likelihood ratios. The fol- 
lowing equivalent expression of BII further demon- 
strates how BIi measures relative "flatness" in the 
likelihood ratio surfaces: 

.. . ^ Cov^[LR(go,g|yob),LR(g,go|yob)] 

^ ^ ' CoY^^eA^R{9oAYco),Ul{9,9o\Yco)V 

where Cov^ is the covariance operator with respect 
to the prior 7r(^), and Cov^^g^ is with respect to 
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f{Yco\Yo];,,6o)Tr{9). In other words, the flatness of 
the likehhood ratio surfaces is measured by the co- 
variance of the hkehhood ratio and its reciprocal. 
Although this expression itself is intuitive because 
a positive function is flat if and only if it is pro- 
portional to its reciprocal, the equivalence between 
(24) and (46) is a bit curious because (24) is based 
on posterior variance whereas (46) is based on prior 
covariance. 

6.6 Connections to Entropy and 

It would be a serious oversight if we do not em- 
phasize the connections of the information measures 
we discuss in this paper to the vast literature on en- 
tropy. Indeed, essentially all measures we presented 
have an entropy flavor, from the large-sample ones 
based on Kullback-Leibler information to the small- 
sample ones involving second-order entropy in the 
form of j{\ogp{e)fp{e)de (see Zellner (2003)). This 
is very natural given that the entropy is a funda- 
mental type of information measure (e.g., Akaike 
(1985)). Indeed, much of the classic results on infor- 
mation measure in optimal sequential designs, which 
our genetic applications resemble (i.e., as one needs 
to decide the next step given what has been ob- 
served), are based on entropy-like quantities and 
their generalizations. This includes both Kullback- 
Leibler information and ChernofF information (Cher- 
noff (1979)). A central difference between that lit- 
erature and our current proposals is that the ex- 
isting literature focuses on quantifying the absolute 
amount of information in an experiment/design, 
whereas our main objective here is to quantify the 
relative amount of information compared to the ab- 
solute amount of information that we would have if 
there were no missing data (e.g., known IBD sharing 
in linkage studies). Furthermore, we investigate two 
sets of relative information, depending on whether 
we can assume the true parameter is in a neighbor- 
hood of the null or not. To the best of our knowl- 
edge, our study is the first serious investigation of 
the roles of null and alternative hypotheses in mea- 
suring relative information. 

Because our Bayesian measures S/f and BI2 are 
defined as ratios of variances, it is also important 
to emphasize their connections to the regression E? 
and to other measures of association/correlation such 
as the linkage disequilibrium measure (e.g., De- 
vlin and Risch (1995)). These measures are related 
to Fisher information and can also be used to esti- 
mate relative information. The main differences are 



that ours are defined via the posterior variability of 
the whole likelihood ratio or log-likelihood ratio, in- 
stead of sampling variances of individual statistics 
or variables. More details on measures of associa- 
tion/correlation used to quantify relative informa- 
tion can be found elsewhere (Nicolae (2006b)). 

7. LIMITATIONS AND FURTHER WORK 

7.1 Further Theoretical and Methodological 
Work 

Clearly much remains to be done, especially for 
the small-sample problems. With large samples, we 
believe the measures we proposed, especially TZIi, 
satisfy essentially all five criteria as discussed in 
Section 1.2. For small samples, the various Bayesian 
measures we proposed, while all satisfy the second 
criterion, have pros and cons regarding the rest of 
the criteria. The most pronounced problem, of course, 
is the choice of a general-purpose "default prior." 
Here we emphasize that the desire for "general pur- 
pose" is motivated by the observation that in many 
applications the investigators need to compute the 
information measures for many data sets (e.g., dif- 
ferent families or pedigrees and different loci in link- 
age analysis; different tests for different haplotype 
models in the association studies) under time con- 
straints. Therefore it is typically not feasible to con- 
struct specific priors for each data set at hand, nor 
is it desirable given that the purpose of hypothe- 
sis testing, in the genetic applications we are inter- 
ested in, has more of a screening nature. A require- 
ment for constructing problem-specific priors would 
be typically viewed as too much of a burden to be 
practically appealing. On the other hand, standard 
recipes for constructing "default" priors do not seem 
to be generally applicable either. For example, the 
use of Jeffreys' prior is typically out of the ques- 
tion because the calculation of the expected Fisher 
information requires us to specify a reliable distri- 
bution over the state space of l^b for arbitrary value 
of 9, which is typically very hard, if not impossible, 
to do. Furthermore, the properties of Jeffreys' prior 
are not clear when we try to avoid the use of Fisher 
information in the first place. 

Second, whereas Big provides a nice connection 
between small-sample and large-sample measures in 
the neighborhood of we currently do not have 
such a measure when the null is far from the truth. 
This is of great theoretical and practical concern, at 
least in the context of genetic studies, because the 
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regions where there is strong evidence against the 
null are precisely the regions we try to identify. One 
possible strategy is to start by estimating 6 based on 
the aggregated data (e.g., using data from the other 
families), and then use a prior that shrinks toward 
this estimated 9 when computing information mea- 
sure for individual components (e.g., families). In 
future work we plan to evaluate this strategy, as a 
part of the general investigation of the sensitivity of 
our Bayesian measures to prior specifications once 
we move out the neighborhood of the null. 

Third, even for large samples, our measures TZIq 
and TZIi can be sensitive to the posited linkage or as- 
sociation model, which may or may not capture the 
real biological process that leads to the linkage or as- 
sociation. This would be particularly true for TZIi, 
which relies more heavily on the model associated 
with the test than TZIq. Although such sensitivity 
is inevitable because without a specific alternative 
model the very notion of relative information may 
not even be defined, as we emphasized previously, 
it is important to understand to what degree our 
information measures can change with our working 
model. Both theoretical and empirical investigations 
are needed, especially for classes of problems that 
are common in practice. Also needed are investiga- 
tions of the impact of nuisance parameters on these 
measures. The haplotype association examples in- 
volve nuisance parameters, for example, population 
genotype risks or population haplotype frequencies, 
and TZIi seems to work adequately in practice. Nev- 
ertheless, it would be interesting to see if further re- 
finements are possible. The illustrative example of 
Section 6.3 strongly suggests that further research is 
necessary to investigate the possible complications 
caused by the nuisance parameters, especially for 

nio. 

7.2 Other Applications 

The genetic applications presented in this paper 
focus on the allele-sharing linkage methods and the 
haplotype-based association studies, but there are 
many other areas in genetics where measuring rel- 
ative information is important. For example, in the 
past years the markers used in genome- wide searches 
for susceptibility loci were mostly microsatellites. 
These are markers that have many alleles, and are 
generally very informative, but are not very common 
across the genome. Because the applications focused 
on small regions of the genome, this lack of abun- 
dance of the microsatellites has led to the still in- 
creasing popularity of the SNPs as genetic markers. 



The SNPs are not as informative as the microsatel- 
lites, but they are highly abundant. Also new tech- 
nology platforms such as the Affymetrix GeneChip 
Mapping lOK, lOOK and 500K Arrays (Matsuzaki, 
Loi and Dong (2004)) are available for SNP geno- 
typing, and they come with a substantial reduction 
in cost. Given that both the microsatellites and the 
SNPs are currently used in gene-mapping studies, 
a fundamental and practical question is how many 
SNPs we need in order to obtain the same amount 
of information as obtained by using microsatellites. 
Differences between SNPs and microsatellites have 
been investigated for linkage (e.g., Kruglyak (1997); 
Schaid et al. (2004); Evans and Cardon (2004); Mid- 
dleton et al. (2004); Thalamuthu et al. (2005)), and 
measures of relative information extracted have been 
proposed (Teng and Siegmund (1998)), but the an- 
swers to similar questions will be different for dif- 
ferent applications. We plan to further explore the 
use of the proposed measures of information to other 
problems of this sort. The comparisons between the 
relative information of sets of SNPs to that of sets of 
microsatellites (relative to the underlying complete 
information) will allow us to make sensible compar- 
isons of the maps for a particular study purpose. 

The gene-mapping research has focused recently 
on genome- wide association studies that are thought 
to have better power to localize genes contribut- 
ing more modestly to disease susceptibility. In these 
studies, new measures are needed for quantifying the 
loss in information due to untyped SNPs, or even 
SNPs that have not been discovered. Also, novel 
tools for measuring information are necessary in choos- 
ing a subset of "tagging" SNPs to type for a dis- 
ease project based on the data from the HAPMAP 
project (The International HapMap Consortium (2003)). 

Other possible applications are in testing for gene- 
environment interaction. This can be done in both 
linkage and association studies, and can increase the 
power of detecting risk factors. In most of these 
studies, the environmental and the clinical data are 
also incomplete. A natural question then arises: "what 
is the most efficient way to allocate the resources: 
what percentage should be devoted to collect more 
genetic information and what percentage should be 
used to collect more covariate information?" The an- 
swer depends again on the specific study, and the 
problem is more complicated because the environ- 
mental and clinical information can be subject to 
much more complicated missing-data patterns, often 
due to unknown reasons. Research is clearly needed 



RELATIVE INFORMATION 



23 



in this direction to explore to what extent it is possi- 
ble to sensibly measure the relative information for 
guiding the allocation of resources, and we hope the 
general framework we set up in this paper provides 
a starting point, if not a solution. 

APPENDIX 

A.l Proof for Section 5.2 

In order to prove the shrinking prior limit results 
in Section 5.2, we need the following lemma. 

Lemma A.l. Let t be a fixed real number, and let 
ai and bi, i = 1,2,3,4, be real continuous functions 
defined on an open interval containing t, such that 
Oi and bi are three times dijjerentiable in a neighbor- 
hood oft. Let di{5;t) = jj^^^ ai{x) dx , and similarly 
forbi{6;t), where i = 1,2,3,4:. If 



(47) 



but 



(48) 



then 



ai{t)a2{t) = az{t)ai{t), 

bl{t)b2{t) = b3{t)bi{t), 

b'i{t)b2{t) + bi{t)b'i{t) 

-b'^{t)b,it)-bsit)b'i{t)^0. 



j.^ ai{S;t)a2{d;t) - d3{d;t)a4{d;t) 
5™ bi{5-,t)b2{6-,t)-b3{S;t)b4{S-,t) 
= {a'l{t)a2{t) + ai{t)a2{t) 
- 4{t)ai{t) - a3{t)a'l{t)) 
.{b';{t)b2{t) + b,{t)b'^{t) 

-but)b,{t)-b3mit)r\ 

Proof. The proof follows from the simple Tay- 
lor expansion 

di{6; t) = 2ai{t)5 + i<(t)5^ + 0{5^), 

and conditions (47) and (48). □ 



Proposition A.l. Let vr be U{9q 
Then 



(49) 



lim BIl = -p;^,^ , 
5^0 S^(Qq\Y^^)^1, 



mil, (70 



l^ob)' 



k = l,2. 



Proof. Let 01(6*) = bi{9) = exp[lod(6',6'o|yob)], 
62(e) = E[exp[lod(0o, 0\Y,o)]\Y,^,9o] and 02(0) = a^\e). 
Then, as in (46), it is straightforward to verify that 



(50) BI^ 



J ai {e)Ti{e) do J a2{e)TT{e) do - 1 
j'bi{e)TT{e)de jb2{e)TT{e)de-i ' 



We can then apply Lemma A.l with 03 = 04 = 63 = 
64 = 1. The result for /c = 1 in (49) then follows be- 
cause 

a'i{eQ)=t{do\Y,^,) + S\eQ\Y,^), 



4'(^o 



and 

b'iieo) 



-tmyoh)+s\eo\Y^^) 

E[-r(0orco)+^"( 



'o\Yco)\Yo^,M 



2L 



mil,'7o|l^b 



yo\Yo\.) + S^ 



%\Yo\. 



Note that condition (47) holds because ai{6Q) = 
biieo) = l for all i. 

For k = 2, the limit can be calculated by observing 
that 



1 Var 



log 



PiYco\Y, 



ob) 



Y 



ob 



P{Y,o\Y,^,eo) 
YaT[lod{e,eo\Y,b)\Y,^ 



and then calculating the limit of the ratio in the 
denominator. A little algebra shows that this ratio 
can be expressed as 



n 2 



ai{e)TT{e)de j a2{e)Ti{9)de 

a3{e)Tr{e)de 

bi{e)^{e)de [ b2{e)^{e)de 



(51) 



b3{e)TT{e)de 

where ai{6) = bi{9) are the same as in (50), but 

a2(0) = E[(lod(0,0o|i"co) -lod(0,eo|i;b))' 
•exp(lod(0,0o|i"co))|i;b,^o], 

03(0) = E[(lod(0,0o|l^co) -lod(e,eo|>;b)) 
•exp(lod(e,0o|>;o))|>^ob,^o], 

b2{e) = \od^{e,eo\Yo^,)al{e) and 

b3{e) = \od{e,eo\Y^^)ai{9). 
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To apply Lemma A.l, we let 04 = 03 and 64 = 63. 
Noting that ai{9o) = bi{0o) = for all i = 2, 3, 4 [and 
hence condition (47) holds], we only need to com- 
pute 02(^0) and 62(^0) in order to obtain the limit. 
This calculation is facilitated by the formula 

(f 

= 2g'^ exp(/) + 2gg" exp(/) + Agg' f exp(/) 
+ 5V"exp(/)+ff2/"exp(/). 
The result then follows because 

and 

4'(0o) = 2E[{S{eo\Y,o) - Sieo\Yoh)f\Yob,eo] 

= 2/^i(0o|>;b). □ 

A. 2 Derivations for Section 6.2 

The derivations are based on the following lemma, 
which is trivial to verify using the Taylor expansion. 

Lemma A. 2. Let f and g be continuous func- 
tions defined on an open interval containing zero, 
such that f{6) = 01 + 026 + 0{6^) and g{6) = bi + 
b25 + 0{6^) as 5^0. Then 



ai 0-2 
bi 



&2(ai/6i 



-5 + 0(6^). 



gi6) 61 ' 61 

As in Section 5, we let 6 = Oq — 9 oh- For 7^/i, we 
need to expand both £ob(^o) and (^(^ol^ob), as func- 
tions of 5. Using the notation given in Section 6.2 
and (37), we have 

70 j — ^ob(^ob) 



(52) 
and 

(53) 



Q(^ol^'ob) — Q(^'obl^'ob) 



2 6 



However, even when 5 = ~ ^ob is small, it is not 
immediate that Oq would be close to ^ob as well. We 
now show that when b is small enough, (^^^'''^(^ol^o) 
and (5^^'°^(^ob|^o) have opposite signs. Consequently, 
^Q, the unique solution of Q^^'''^(0|6'o) =0, must be 
between 6*0 and ^ob, and hence \Qq — 0ob| ik\b\- 

To see this, we first expand g{d) = Q'^^^^\e\e) around 
g{&oh) to obtain 



(55) 



But the following general result, proved in Meng 
(2000): 



'^ob 



(56) 



Q 



{e\e) 



for any A; > 0, 



implies that g{9 oh) = 
—Job- Consequently, 



Oand qT + Q'^' 



It 



(2) 
ob 



(57) 



Q^''°^(^o|^o) = -/ob5 + 0(52 



For QS^'^^{Poh\Q^^ using the notation in (15) and 
(35), we have 



(1,0)/ 



'oh\^Q) 



(58) 



Cb(^ob) + i^(''°H^ob|^o) 
i^(''°^(eo|eo)(eob-^o) + 0(5') 

/mi(eo)'5 + 0(52). 



where /mi(^) is as defined in (33). Since both Job and 
-^mi(^o) are positive, we conclude from (57) and (58) 
that Q(^'°)(^ob|^o) and Q(^'°H^o|^o) have opposite 
signs when b is small enough. Therefore we have 
established that 6q — Oq = 0{6), and consequently 
we can express 



(59) 



9Q-eo = B6 + C6^ + 0{6^), 

and are to be 



where B and C are 0(1) as 6 
Expansion (36) then follows directly from Lemma A. 2. determined. 

To establish a similar expansion for TZIq, let 9q To determine B and C, we first note that 
be the maximizer of Q{9\9q); recall we assume that 
9o is unique. Then Q^^'^\9o\9o 
^ (60) 



(54) 



Qmeo) - Q{9o\9o) 

^ob(6'ob) — ■^ob(^o) 



(3) 



-Ioh6+^6^ + 0{5^) 
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and 



(61) 



= Q(i'0)( 
= 4^(^o)+G(2)(^o)(^Q-^o) 

where G^^^O) = Q^'''°\e\e). Substituting (59) and 
(60) into (61) and solving for B and C, we obtain 



(62) 



B 



C - 



G(2)(0o) 



and 



2G(2)(^o) ' 

Noting that Q^^) {Qq) = l^^\eQ) and (60), we then 
obtain 



^o) — Q(^o|^o 



:G«(^0)(^Q-^0)+^ 



+ 



-/obi? + ^i?'G(2)(0O 



+ 



+ BCG^^'^{eo) + -B^G^^\eo) 
6 



-6' + 



^ob -^ob 



+ 



53 + 0(5^) 



2G(2)(^o) 6[G(2)(0o)]3 



2G(2)(^o) 
+ 0(5^). 
Combining this expansion with 

G(^H^o) = Qr + [Qir + Qif]5 + 0(<^2) 

and applying Lemma A. 2, we obtain 
-Q(^ol^o) 



+ 



(2,1)n 



■^obi 



)(3,0) 



By Lemma A. 2, the above equation and (52) to- 
gether imply that TZIq of (54) has the expansion 
(38). 
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